A computer readability formula of Japanese texts for machine scoring

نویسندگان

  • Yuka Tateisi
  • Yoshihiko Ono
  • Hisao Yamada
چکیده

A readability formula is obtained that can be used by computer programs for style checking of Japanese texts and need not syntactic or semantic information. The formula is derived as a linear combination of tile surface characteristics of the text that are related to its readability: (1) the average number of characters per sentence, (2) for each type of characters (Roman alphabets, kanzis, hiraganas, katakanas), relative frequencies of rims (maximal swings) that ,:onsists only of that type of characters, (3) the average number of characters per each type of runs, and (4) tooten (comma) to kuten (period) ratio. To find the proper weighting, principal component analysis (PCA) was appliedto these characteristics taken from 77 sample texts. We have found a component which is related to the readability. Its scores match to the empirical knowledges of reading ease. We have also obtained experimental confirmation that the component is an adequate measure for stylistic ease of reading, by the cloze procedure and by the examination on the average lime taken to fill out one blank of the cloze texts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward a Readability Index for Japanese Learners of EFL

In our previous research a linear readability formula was developed through a series of multiple regression analyses using four independent variables: (1) sentence length, (2) word length, (3) textbook-based word difficulty and (4) textbook-based idiom difficulty, and one dependent variable: year level of EFL textbook. The present study attempts to develop a new readability formula that include...

متن کامل

Do NLP and machine learning improve traditional readability formulas?

Readability formulas are methods used to match texts with the readers’ reading level. Several methodological paradigms have previously been investigated in the field. The most popular paradigm dates several decades back and gave rise to well known readability formulas such as the Flesch formula (among several others). This paper compares this approach (henceforth ”classic”) with an emerging par...

متن کامل

An analysis of a French as a Foreign Language Corpus for Readability Assessment

Readability aims to assess the difficulty of texts based on various linguistic predictors (the lexicon used, the complexity of sentences, the coherence of the text, etc.). It is an active field that has applications in a large number of NLP domains, among which machine translation, text simplification, text summarisation, or CALL (Computer-Assisted Language Learning). For CALL, readability tool...

متن کامل

Readability Assessment of Translated Texts

In this paper we investigate how readability varies between texts originally written in English and texts translated into English. For quantification, we analyze several factors that are relevant in assessing readability – shallow, lexical and morpho-syntactic features – and we employ the widely used Flesch-Kincaid formula to measure the variation of the readability level between original Engli...

متن کامل

Japanese Controlled Language Rules to Improve Machine Translatability of Municipal Documents

We report on experiments to test the effectiveness of controlled language (CL) rules on texts from Japanese municipal websites. We compiled a set of rules by trial and error, systematically rewriting Japanese source texts and analysing the machine translation (MT) outputs. We then employed native English speakers with little knowledge of Japanese as human evaluators and tested the understandabi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1988